Project supported by the National Natural Science Foundation of China (Grant Nos. 71501149 and 71231007), the Soft Science Project of Hubei Province, China (Grant No. 2017ADC122), and the Fundamental Research Funds for the Central Universities, China (Grant No. WUT: 2017VI070).
Project supported by the National Natural Science Foundation of China (Grant Nos. 71501149 and 71231007), the Soft Science Project of Hubei Province, China (Grant No. 2017ADC122), and the Fundamental Research Funds for the Central Universities, China (Grant No. WUT: 2017VI070).
† Corresponding author. E-mail:
Project supported by the National Natural Science Foundation of China (Grant Nos. 71501149 and 71231007), the Soft Science Project of Hubei Province, China (Grant No. 2017ADC122), and the Fundamental Research Funds for the Central Universities, China (Grant No. WUT: 2017VI070).
We study the stochastic evolutionary public goods game with punishment in a finite size population. Two kinds of costly punishments are considered, i.e., first-order punishment in which only the defectors are punished, and second-order punishment in which both the defectors and the cooperators who do not punish the defective behaviors are punished. We focus on the stochastic stable equilibrium of the system. In the population, the evolutionary process of strategies is described as a finite state Markov process. The evolutionary equilibrium of the system and its stochastic stability are analyzed by the limit distribution of the Markov process. By numerical experiments, our findings are as follows. (i) The first-order costly punishment can change the evolutionary dynamics and equilibrium of the public goods game, and it can promote cooperation only when both the intensity of punishment and the return on investment parameters are large enough. (ii) Under the first-order punishment, the further imposition of the second-order punishment cannot change the evolutionary dynamics of the system dramatically, but can only change the probability of the system to select the equilibrium points in the “C+P” states, which refer to the co-existence states of cooperation and punishment. The second-order punishment has limited roles in promoting cooperation, except for some critical combinations of parameters. (iii) When the system chooses “C+P” states with probability one, the increase of the punishment probability under second-order punishment will further increase the proportion of the “P” strategy in the “C+P” states.
Social dilemma problems are prevalent in the real world. They essentially depict the conflict between individual rationality and collective rationality.[1] The public goods game as a classic model is used to describe the dilemma between multiplayers’ interactions. The “free riding” phenome-non it portrayed has received widespread attention in economics,[2–6] psychology,[7–10] evolutionary biology,[11–13] complexity science,[14–18] and other disciplines of scholars. In the standard public goods game, participants have two choices of cooperation and defection. Cooperators participate in the investment, and defectors do not participate in the investment. The total investment of all cooperators in the public goods game can be amplified, and then distributed equally among all participants. If each individual chooses to cooperate, then everyone can get the maximized benefits, in which situation the system reaches the Pareto optimal state. However, due to the non-exclusive and non-competitive nature of the public goods, defectors can get the benefits of the cooperators’ contribution by free riding, resulting in a higher net income than the cooperators. From the perspective of individual rationality, free riding is an optimal strategy for each individual. From the evolutionary point of view, regardless of the cooperator’s proportion in the initial population, the cooperation strategy will eventually be replaced by the defection of the free riders. The public goods game cannot form highly efficient social cooperation.
In order to improve the efficiency and promote the evolution of cooperation, scholars have proposed some mechanisms to reduce the free riding behavior, such as the reward and punishment mechanism,[13,19–25] reputation mechanism,[12,26–31] optional participation mechanism,[18,25,32–36] spatial interaction mechanism,[14,37–42] etc.[43–45] We focus on the punishment mechanism in this paper. Punishment, as an efficient way to change human behavior, is regarded as an important mechanism to maintain cooperation in our human society. Through behavioral experiments, experimental economists have found that punishment and reward can significantly enhance the level of cooperation in the public goods game.[46] Different forms of punishment have also attracted the broad interest of scholars. Based on different parties to carry out the punishment, punishment can be divided into three cases: first-party punishment, second-party punishment, and third-party punishment. First-party punishment refers to uncomfortable feelings when an individual violates the group’s norms or defection. It is an inner self-conscience punishment, such as guilt, shame or embarrassment.[47] Second-party punishment refers to the punishment imposed by others involved in the game on the free riders.[21,22] Third-party punishment refers to the punishment imposed by spectators on the free riders.[23,48] The spectators here can be a completely independent external group, such as the judicial system, or a potential group that the defectors may meet in the future. The reputation mechanism can be seen as a special case of third-party punishment. In this paper, we study the second-party punishment with cost in the public goods game. It takes punishment as a participant strategy. Participants can choose this strategy to reduce the benefits of some other types of strategies, which will bring additional costs to themselves. This study considers two forms of second-party punishments: punishing only the defection strategy (first-order free ride behavior); punishing not only the defectors, but also the cooperators who do not punish the defective behaviors (second-order free ride behavior). These two forms of punishments are called first-order punishment and second-order punishment respectively in the following.
Evolutionary games which can effectively describe the evolutionary process of strategies have been widely used to study the evolution of cooperation in social dilemmas. In evolutionary games, only bounded rationality is needed for the assumption of participants. The model is based on groups and describes the updating process of individuals’ strategies according to their incomes. For the public goods game, in Ref. [12] the replicator dynamics was used to study the influence of reward on the cooperative behavior of the population, and compared with the punishment mechanism. The replicator dynamics is a deterministic evolution model, in which it is assumed that the number of individuals in the population is infinite. Thus, differential equations are used to describe the evolutionary process of different strategies in the system. This model analyzes the evolutionary stable states of the system by analyzing the stability of the differential equations at the equilibrium point. The stable equilibrium point is the evolutionary stable strategy (ESS) of the game. In order to describe a finite size population and the randomness of the system, in Ref. [49] the public goods game was studied based on the Moran process that is widely used in ecology. They investigated the influence of reward on the cooperative behavior of the finite size population. In the Moran-based model the fixation probability was used to analyze the evolutionary dynamics of the system for only two strategies. This analysis is conducted necessarily under the assumption of the weak selection generally.
In order to describe the continuous noise effects caused by mutation in the evolutionary process of the system, Young and Foster[50,51] introduced the concept of stochastic stable equilibrium (SSE) for the first time. This concept can better describe the evolutionary stability of strategy in a stochastic environment. However, since the concept of SSE was proposed, only a few articles have been devoted to the SSE of the games. As far as we know, recently, Quan et al.[52,53] studied the SSE of the evolutionary games in a finite size population based on the Markov process. Other models are basically based on the stochastic differential equations for analysis under the assumption of infinite size populations as indicated in the research by Huang et al.[54] and Liang et al.[55] Unlike the above-mentioned evolution model that aims at the ESS of the game, we study the SSE of the public goods game with punishment in a finite size of population. We adopt the stochastic evolution model proposed in Ref. [56], where the evolutionary process of strategies in the population is described as a finite state multidimensional Markov process. The stochastic stabilities of the system under different punishment parameters are analyzed by the limit distribution of the Markov process. By analyzing the SSE of this stochastic system, the influence of the punishment mechanism on the cooperation behavior of the population is revealed.
The remainder of the paper is organized as follows. In Section
Two forms of punishing strategies are introduced into the classic public goods game, which can be called the first-order and the second-order punishment respectively. The first-order punishing strategies participate in the investment and only punish the first-order free riders (defection strategies). The second-order punishing strategies participate in the investment, and punish not only the first-order free riders but also the second-order free riders (the sole cooperators who does not punish the defectors). Let α1 and α2 (α1 > α2) denote the probabilities of the punishing strategies to punish the defection and cooperation strategies respectively. We assume that the processes of punishing the defectors and the sole cooperators are separate. Therefore, we can consider these two random events independently. The punishment is costly. Let γ denote the unit cost of punishment for the punisher, and β refer to the corresponding unit penalty for the individual who is to be punished (γ < β).
Suppose that the population contains three types of strategies, that is, cooperation, defection, and punishment strategies. Each time N individuals from the population are randomly selected to participate in the public goods game. Let c denote the cost of investment, and r the return on investment (1 < r < N) of the public goods. Each individual chooses the corresponding strategy according to its type. In a sample, when the numbers of cooperation, defection, and punishment type individuals are nC, nD, and nP respectively, then the cooperation-type individual’s payoff is [rc(nC + nP)]/N − c − α2βnP, the defection-type individual’s payoff is [rc(nC + nP)]/N-α1βnP, and the punishment-type individual’s payoff is [rc(nC + nP)]/N − c − α1γnD − α2γnC.
Assuming that a finite population size is M, the three types of strategies for cooperation, defection, and punishment are well mixed. Each time N individuals from the population are randomly selected to participate in the public goods game. Obviously, M ≥ N. As in the other literature on the finite size population, such as in Refs. [32] and [49], here in our work we do not discuss the trivial case of M = N. Let i, j, k (i + j + k = M) denote the numbers of the three types of individuals respectively. In the following, we analyze the expected payoff for each type of strategy in this finite size population.
We do not consider the penalty items in our analysis at first. In this case, the cooperation and punishment strategies are the same, including their payoffs. For a defector, when it encounters another N − 1 individual in the population, the probability with which there are m cooperators or punishers, and the rest are N − 1 − m defectors, is
Taking into account the corresponding penalty or punishing cost for each type of strategy, the above-mentioned expected payoff can be corrected as follows.
For a defection-type individual, the total penalty brought by the punishment-type strategies is k/(M − 1)(N − 1)α1β, where α1 is the probability with which a defector is punished by a punishment-type strategy, β is the penalty intensity, and k/(M − 1)(N − 1) is the expected number of punishment-type individuals in a sample.
For a cooperation-type individual, the total penalty brought by the punishment-type strategies is k/(M − 1)(N − 1)Ψ(j)α2β, where α2 is the probability with which a cooperator (second-order free rider) is punished by a punishment-type strategy:
For a punishment-type individual, the total punishing cost is j/(M − 1)(N − 1)α1γ + i/(M − 1)(N − 1)Ψ (j)α2γ, the former and latter items represent the costs of punishing the defectors and the cooperators respectively.
Thus, when the size of the population is M, and the numbers of cooperators, defectors, and punishers in the group are i, j, k, the expected payoffs of the three types of strategies are respectively
The number of each type of strategies will evolve with the quantity of their payoffs. In order to describe the evolutionary process, we introduce a stochastic process z(t). Let z(t) = (z1 (t), z2 (t), M − z1 (t) − z2 (t)) denote the number of cooperation-, defection- and punishment-type strategies in the population at time t, and define z(t) as the system state. For convenience, we abbreviate it as (z1 (t), z2 (t)). The state space of the system is S = (i, j)| 0 ≤ i + j ≤ M; i, j ∈ ℕ}, and the number of elements in the state space is |S| = (M + 1)(M + 2)/2. Each time, individuals in the population adjust their strategies according to their expected payoffs. The strategy adjustment leads to the change of the system state. The three assumptions: inertia, myopic, and mutation in Ref. [56] about bounded rationality of individuals in the population are used in our model. Due to inertia, we can assume that it is impossible to have more than two individuals adjust their strategies simultaneously once. Myopic refers to the individual when choosing its strategy: it will only consider the current payoff, regardless of the payoff in the future. Mutation refers to the possibility with which individuals may choose a non-optimal strategy with a small probability because of the complex decision-making environment and the limited nature of individual cognitive capability.
According to the above assumptions, when the system state is (z1 (t), z2 (t)) = (i, j) ∈ S, the transfer rate of the strategy x towards strategy y can be described as
When i takes zero,
As ε > 0, this process is ergodic. According to the properties of the stochastic process, when t → + ∞, the limit of pI,I′ (t) exists and it is independent of the initial state I. Let
The Gauss–Seidel iterative algorithm introduced by Stewart in his monograph[57] can be used to calculate the limit distribution of the above Markov process. We show the effects of game parameters on the stochastic stability equilibrium of the system in the following. By vast numerical calculations, we find that when ε gradually decreased to zero, the system has limit distributions of more than zero only in (0,M,0), (i,0,M − i) (i = 0,1,2,...,M) states. Thus, only the states of (0,M,0), (i,0,M−i) (i = 0,1,2,...,M) may be the stochastic stable states of the evolutionary system. Among them, state (0,M,0) indicates that all the individuals choose the defection strategy, denoting it as the “All D” state. States (i,0,M − i) (i = 0,1,2,...,M) indicate the co-existence of the cooperation and punishment strategies, or all individuals choosing the punishment strategy, or all individuals choosing the cooperation strategy, we denote them as the “C + P” states. In the following, we fix parameters M = 20, N = 5, c = 1, κ = 1, α1 = 1 γ = 0.2, and study the influences of β, r, α2 on the probabilities of the system to choose different stable equilibria.
By numerical calculation, we find that for fixed r = 4.5, and for different values of α2 = 0,0.5,0.8, there are common critical values of
Figure
Figure
According to Figs.
In this paper, we introduce two types of costly punishment strategies in the traditional public goods game. Considering continuous noise in the strategy evolution process, a stochastic dynamic model in a finite size population is established. The evolution of the system is described as a continuous and finite state multidimensional Markov process. We analyze the stochastic stable state of the evolutionary system by the limit distribution of the Markov process. The influences of parameters such as the return on investment coefficient, punishment intensity, and punishment probability on the cooperative behavior of the population are studied under the first and second order punishment respectively. Unlike the most commonly used replicator dynamic models in infinite populations, the equilibrium state based on the Markov process is stochastically stable, and it does not depend on the initial state of the system. Unlike the Moran process based model in finite populations, the model proposed in this paper is applicable to all possible states, while the Moran process using the fixation probability can only analyze the probabilities of extreme states that all individuals choose the same strategy. Therefore, the model in this paper has strong adaptability. The analysis framework can also be applied to many other cases.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] | |
[47] | |
[48] | |
[49] | |
[50] | |
[51] | |
[52] | |
[53] | |
[54] | |
[55] | |
[56] | |
[57] |